Gravitational tree-code on graphics processing units: implementation in CUDA
نویسندگان
چکیده
We present a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way we achieve a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s. It takes about a second to compute forces on a million particles with an opening angle of θ ≈ 0.5. The code has a convenient user interface and is freely available for use1.
منابع مشابه
A sparse octree gravitational N-body code that runs entirely on the GPU processor
We present the implementation and performance of a new gravitational N body tree-code that is specifically designed for the graphics processing unit (GPU) . All parts of the tree-code algorithm are executed on the GPU. We present algorithms for parallel construction and traversing of sparse octrees. These algorithms are implemented in CUDA and tested on NVIDIA GPUs, but they are portable to Ope...
متن کاملNumerical Simulation of a Lead-Acid Battery Discharge Process using a Developed Framework on Graphic Processing Units
In the present work, a framework is developed for implementation of finite difference schemes on Graphic Processing Units (GPU). The framework is developed using the CUDA language and C++ template meta-programming techniques. The framework is also applicable for other numerical methods which can be represented similar to finite difference schemes such as finite volume methods on structured grid...
متن کاملDesign and Implementation of Cover Tree Algorithm on CUDA-Compatible GPU
Recently developed architecture such as Compute Unified Device Architecture (CUDA) allows us to exploit the computational power of Graphics Processing Units (GPU). In this paper we propose an algorithm for implementation of Cover tree, accelerated on the graphics processing unit (GPU). The existing algorithm for Cover Tree implementation is for single core CPU and is not suitable for applicatio...
متن کاملAutomatic C-to-CUDA Code Generation for Affine Programs
Graphics Processing Units (GPUs) offer tremendous computational power. CUDA (Compute Unified Device Architecture) provides a multi-threaded parallel programming model, facilitating high performance implementations of general-purpose computations. However, the explicitly managed memory hierarchy and multi-level parallel view make manual development of high-performance CUDA code rather complicate...
متن کاملParallel Implementation of Particle Swarm Optimization Variants Using Graphics Processing Unit Platform
There are different variants of Particle Swarm Optimization (PSO) algorithm such as Adaptive Particle Swarm Optimization (APSO) and Particle Swarm Optimization with an Aging Leader and Challengers (ALC-PSO). These algorithms improve the performance of PSO in terms of finding the best solution and accelerating the convergence speed. However, these algorithms are computationally intensive. The go...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010